An automated approach for clustering an ensemble of NMR-derived protein structures into conformationally related subfamilies.
نویسندگان
چکیده
Unlike structures determined by X-ray crystallography, which are deposited in the Brookhaven Protein Data Bank (Abola et al., 1987) as a single structure, each NMR-derived structure is often deposited as an ensemble containing many structures, each consistent with the restraint set used. However, there is often a need to select a single 'representative' structure, or a 'representative' subset of structures, from such an ensemble. This is useful, for example, in the case of homology modelling or when compiling a relational database of protein structures. It has been shown that cluster analysis, based on overall fold, followed by selection of the structure closest to the centroid of the largest cluster, is likely to identify a structure more representative of the ensemble than the commonly used minimized average structure (Sutcliffe, 1993). Two approaches to the problem of clustering ensembles of NMR-derived structures have been described. One of these (Adzhubei et al., 1995) performs the pairwise superposition of all structures using C a atoms to generate a set of r.m.s. distances. After cluster analysis based on these distances, a user-defined cut-off is required to determine the final membership of clusters and therefore the representative structures. The other approach (Diamond, 1995) uses collective superpositions and rigid-body transformations. Again, the position at which to draw a cut-off based on the particular clustering pattern was not addressed. Whenever fixed values are used for the cut-off in clustering, there is a danger of missing 'true' clusters under the threshold imposed by the rigid cut-off value. Considering the highly diverse nature of NMR-derived ensembles of proteins, it would seem most appropriate to avoid the use of predefined values for determining clusters. In fact, of the 302 ensembles we have studied, the average pairwise r.m.s. distance across an ensemble varied from 0.29 to 11.3 A (mean value 3.0, SD 1.9 A). Here we present an automated method for cut-off determination that avoids the dangers of using fixed values for this purpose. We have developed a computer program that automatically, systematically and rapidly (i) clusters an ensemble of structures into a set of conformationally related subfamilies, and (ii) selects a representative structure from each cluster. The program uses the method of average linkage to define how clusters are built up, followed by the application of a penalty function that seeks to minimize simultaneously the number of clusters
منابع مشابه
An automated approach for defining core atoms and domains in an ensemble of NMR-derived protein structures.
A single NMR-derived protein structure is usually deposited as an ensemble containing many structures, each consistent with the restraint set used. The number of NMR-derived structures deposited in the Protein Data Bank (PDB) is increasing rapidly. In addition, many of the structures deposited in an ensemble exhibit variation in only some regions of the structure, often with the majority of the...
متن کاملDesigning and analyzing the structure of Tat-BoNT/A(1-448) fusion protein: An in silico approach
Clostridium botulinum type A (BoNT/A) produces a neurotoxin recently found to be useful as an injectable drug for the treatment of abnormal muscle contractions. The catalytic domain of this toxin which is responsible for the main toxin activity is a zinc metalloprotease that inhibits the release of neurotransmitter mediators in neuromuscular junctions. A cell penetrating cationic peptide, Tat, ...
متن کاملOptimum Ensemble Classification for Fully Polarimetric SAR Data Using Global-Local Classification Approach
In this paper, a proposed ensemble classification for fully polarimetric synthetic aperture radar (PolSAR) data using a global-local classification approach is presented. In the first step, to perform the global classification, the training feature space is divided into a specified number of clusters. In the next step to carry out the local classification over each of these clusters, which cont...
متن کاملA new ensemble clustering method based on fuzzy cmeans clustering while maintaining diversity in ensemble
An ensemble clustering has been considered as one of the research approaches in data mining, pattern recognition, machine learning and artificial intelligence over the last decade. In clustering, the combination first produces several bases clustering, and then, for their aggregation, a function is used to create a final cluster that is as similar as possible to all the cluster bundles. The inp...
متن کاملImproving Accuracy in Intrusion Detection Systems Using Classifier Ensemble and Clustering
Recently by developing the technology, the number of network-based servicesis increasing, and sensitive information of users is shared through the Internet.Accordingly, large-scale malicious attacks on computer networks could causesevere disruption to network services so cybersecurity turns to a major concern fornetworks. An intrusion detection system (IDS) could be cons...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Protein engineering
دوره 9 11 شماره
صفحات -
تاریخ انتشار 1996